# A pro-government disinformation campaign on Indonesian Papua

## Overview
These Jupyter notebooks contain the code used to scrape, extract, and detect the language of tweets for the article _A pro-government disinformation campaign on Indonesian Papua_ published in _Harvard Kennedy School Misinformation Review_, October 2022, Volume 3#, Issue 5#.

## Software environment
The connection to the Twitter API v2 for Academic Research was done in Python using Jupyter notebooks. The Jupyter notebooks were run on a Melbourne Research Cloud virtual machine, provided by Research Computing Services at the University of Melbourne. The operating system for the virtual machine was Ubuntu 18.04.6 LTS.

R was used for further data wrangling, analysis, and creating tables and figures.

## Dependencies
In addition to standard Python libraries, uses custom-written package [`twitterutils`](https://gitlab.unimelb.edu.au/mdap-public/twitterutils).

## Data
Due to restrictions from Twitter API v2 for Academic Research (https://developer.twitter.com/en/developer-terms/more-on-restricted-use-cases), we are only able to provide a severely limited version of the dataset used in this analysis.

## Order of execution
The notebooks were run in the order of their numbering. No other scripts have to be run apart from the notebooks.

| Notebook | Description |
|----------------------|---------------------------------------------------------------------------------------------------------------|
| 1_20210608_twitter_scrape.ipynb | Scrapes Twitter for tweets containing keywords of interest and saves the resulting combined json | 
| 2_2021-07-08_twitter_extract.ipynb | Extracts fields of interest from the json file and saves in tabular format to a pickled file | 
| 3_2021-10-13_language_detection | Uses the polyglot package to detect tweets in Indonesian that may have been missed by the Twitter language detection algorithm. Adds language columns to the pickled file and saves in .Rds format for further analysis using R. | 
| 4_2022-07-28_user_statuses | Takes in the list of unique author IDs from the scrape, and checks whether the accounts have been suspended by Twitter or no longer exist | 